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Abstract 

As an alternative to parsimony analyses, stochastic models have been proposed 
(Lewis, 2001; Nylander et al., 2004) for morphological characters, so that max- 
imum likelihood or Bayesian analyses may be used for phylogenetic inference. 
A key feature of these models is that they account for ascertainment bias, in 
that only varying, or parsimony-informative characters are observed. However, 
statistical consistency of such model-based inference requires that the model 
parameters be identifiable from the joint distribution they entail, and this issue 
has not been addressed. 

Here we prove that parameters for several such models, with finite state 
spaces of arbitrary size, are identifiable, provided the tree has at least 8 leaves. 
If the tree topology is already known, then 7 leaves suffice for identifiability of 
the numerical parameters. The method of proof involves first inferring a full 
distribution of both parsimony-informative and non-informative pattern joint 
probabilities from the parsimony-informative ones, using phylogenetic invari- 
ants. The failure of identifiability of the tree parameter for 4-taxon trees is also 
investigated. 
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1. Introduction 

Currently, the vast majority of phylogenetic inference from morphological 
data is based on the parsimony criterion - the preference for the phylogeny that 
can explain the data with the fewest number of changes in character states. 
Lewis (2001) discussed the obstacles to the application of Markov models for 
phylogenetic inference to discrete morphological data. He argued that, despite 
its limitations, the simplest continuous-time Markov model offers advantages 
over relying solely on parsimony. He referred to this model as the Mfc model. 
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for "Markov" with fc-states. The Mk model is a generahzation of some of the ear- 
liest models used in phylogenetics by Jukes and Cantor (1969), Neyman (1971), 
Farris (1973), and Cavender (1978); it assumes that aU states have the same 
frequency and all transitions between different states occur at the same rate. 
Maximum likelihood (ML) inference under the Mk model should be able to infer 
trees more accurately than parsimony, because use of the Mfc model allows one 
to take into account that some branches on the tree may be longer than others. 
This branch length heterogeneity could arise from differences in the temporal 
duration of branches, differences in the rate of character evolution, or both. 
Parsimony does not attempt to correct for the fact that convergent changes 
may not be equally likely to occur on every branch of the tree; as a result, it has 
been shown to be susceptible to "long-branch attraction" when branch-length 
heterogeneity is present (Felsenstein, 1978). 

As Lewis (2001) pointed out, complications arise when applying the Mfc 
model to morphological characters. The definitions of both characters (also 
referred to as "transformation series") and character states are problematic in 
morphological systematics. Systematists disagree about the most appropriate 
meanings of concepts such as homology (c/. Sereno, 2007; Rieppel and Kearney, 
2007; Wiley, 2008) which are crucial to character coding. Even if one chooses 
a particular definition of homology, doing so does not establish a clear-cut set 
of rules for "atomizing" the complete morphology of an organism to a set of 
characters that can be treated as independent instances of a general model of 
character evolution (but see Ramirez, 2007, for one example of an attempt to 
create an objective system for character coding). While there are rarely clear 
criteria for delimiting character states or identifying homologous traits between 
species, systematists try to find traits that can be cleanly scored into one of a 
few discrete bins. For example, the degree contact of bones in the skull could be 
scored as a 2-state character with state representing "not touching" and state 1 
representing "in contact." If these characters are heritable and do not change too 
quickly over evolutionary time, then even simple, discrete-state coding scheme 
can provide information about evolutionary relationships. Of particular concern 
here is the fact that variation across the taxa under investigation is vital to the 
process of recognizing characters and character states. This means that it is 
not appropriate to view the coded character matrix as a random sampling of 
characters generated by the evolutionary process, since it is biased to contain 
characters thought to be phylogenetically useful. Such ascertainment bias, in 
which the collected data fails to be representative of the entire population of 
characters, must be corrected for in a valid statistical analysis. 

It is difficult to precisely describe the biases inherent in the process of the 
coding of morphological traits into columns of a data matrix. For one thing, 
it is relatively rare for systematists to even report their methods for exclud- 
ing potential characters from consideration; Poc and Wicns (2000) found that 
fewer than 20% of papers in morphological systematics reported such criteria. 
The requirement that there be variability among the taxa of interest is clearly 
one important aspect of character coding. As noted by Sereno (2007), several 
definitions of "character" or "homology" given by systematists include the idea 
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that a character differentiates between taxa. For example he pointed out the 
following definitions of "character" in systematics: 

• "Any attribute of an organism or a group of organisms by which it dif- 
fers from an organism belonging to a different category or resembles an 
organism of the same category" (p. 315, Mayr et al., 1953) 

• "We will call those peculiarities that distinguish a semaphoront (or a group 
of semaphoronts) from other scmaphoronts 'characters'. . ."(p. 7, Hcnnig, 
1966) 

• "an observation that captures distinguishing peculiarities among organ- 
isms . . ." (Rieppel and Kearney, 2002) 

The emphasis on variability among taxa means that constant characters gener- 
ally do not appear in morphological character matrices. When they do occur, 
it is often the result of pruning the list of taxa (the characters had been chosen 
because of variation among members of a larger set of taxa). As Lewis (2001) 
noted, this bias cannot be corrected by morphologists changing their systems 
for encoding characters. How many constant characters should be encoded to 
represent a complex feature that is identical across a set of taxa? The question 
seems intractable because there are no strict rules about how many aspects of 
a trait should be coded as independent characters. 

The absence of constant characters is the most obvious effect of the ascer- 
tainment biases in the coding of morphological data matrices. Lewis (2001) 
proposed a corrected model, Mkv, which can be used to calculate a likelihood 
from a data set conditional on the fact that only variable characters are sampled 
(see also Felsenstein, 1992). 

Nylander et al. (2004) further noted that many morphological matrices only 
contain parsimony-informative characters. A parsimony-informative character 
is one which does not have the same parsimony score on every tree. In order for 
a character to be parsimony-informative, it must have more than one character 
state that is shared by multiple taxa. For instance, if a character 'wing shape' 
for a collection of insects has 3 states, and there is exactly one taxon with shape 
2 and one with shape 3, with all others having shape 1, then the character 
is variable, but not parsimony-informative. If, one the other hand, at least 
two taxa have shape 1 and at least 2 taxa have shape 2, then the character is 
parsimony- informative. 

If parsimony-noninformative characters are avoided in the process of charac- 
ter coding, then one has data from a smaller set of characters than for the Mfcv 
model and should condition the likelihood calculations based on the parsimony- 
informative ascertainment bias. We will refer to this model as M/cpars-inf- It was 
introduced by Nylander et al. (2004), and implemented in the freely available 
software, MrBayes (Ronquist and Huclsenbcck, 2003). 

The use of the Mfcv and Mfcpais-inf model in phylogenetic inference has grown 
steadily. Dozens of studies using these models have now been published. Un- 
fortunately, in many cases, authors do not report which form of conditioning is 
used during analyses, so it is impossible to ascertain the relative frequency of 
the Mfcv model compared to the Mfcpars-inf model. 
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The statistical inconsistency of parsimony as an estimator of phylogenetic 
trees was one of Lewis's (2001) primary reasons for proposing the Mfcv model 
as an improved basis of inference. However, Lewis (2001) did not prove that 
ML inference using the Mkv model is a consistent estimator of the phylogeny. 
Nor did Nylander et al. (2004) prove the consistency of ML inference under the 

Mfcpars-inf modcl. 

Recall that statistical consistency of a method of inference under a model 
means that if data is generated according to the model, then as the amount 
of data grows, the probability of inferring the correct model parameters (e.g., 
the tree topology and numerical parameters such as edge lengths) approaches 1 . 
There is a standard approach to proving consistency under maximum likelihood 
(Wald, 1949) that reduces the issue to proving the model has identifiable pa- 
rameters. This means the crucial step is to show that any two different choices 
of model parameters lead to a different distribution of data. Identifiability of 
model parameters is equally essential for their inference in Bayesian analyses, 
and non-idcntifiability of parameters that arc not the focus of such an analysis 
may also be problematic (Rannala, 2002). 

The identifiability of the Mk model can be proved through arguments based 
on an appropriate generalization of the Jukes-Cantor distance. However, the 
question that we address is whether we can identify the tree and model pa- 
rameters when the data is filtered to contain only variable patterns, or only 
parsimony- informative patterns. This filtering greatly changes the problem, so 
that a straightforward modification of the proof for Mk to M/cpars-inf fails. One 
instance of the question of tree identifiability using only parsimony-informative 
pattern frequencies was investigated by Steel et al. (1993). Although that paper 
is focused on other issues, in the appendix it is shown that for the CFN model 
on 4-taxon trees, several explicit choices of edge lengths on different tree topolo- 
gies can lead to identical distributions of parsimony-informative (and constant) 
patterns. 

Here we demonstrate that the tree topology is identifiable under the Mfcv 
model under sufficiently broad circumstances to justify its use in data analysis. 
While we show that under the Mfcpars-inf model, k > 2, the tree topology is 
not identifiable for 4-taxon trees, more importantly we establish that the tree 
topology is identifiable when eight or more taxa are involved. Moreover, if the 
tree is known, then the branch lengths are identifiable on trees of seven or more 
taxa. (The need for seven or eight taxa in these statements may be an artifact 
of our methods; we do not fully analyze the cases of trees with five, six, or seven 
taxa.) 

Our results are actually valid for more general models, the variable-patterns- 
only and parsimony-informative-patterns-only versions of the fc-state general 
Markov model GMfc, a generalization of the Mfc model in which the transition 
probabilities on edges are not constrained to be equal among the different states. 
The identifiability of the tree topology for the unfiltered GMfc was proven by 
Steel (1994), and the identifiability of all numerical parameters for this model 
by Chang (1996). Identifiability results for a 2-class mixture model of GMfc 
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with invariable sites (GMfc+I) were investigated by Allman and Rhodes (2008a). 
That model is quite closely related to a variable-patterns-only model, as the 
invariable class essentially makes direct observation of constant patterns from 
variable characters impossible. Our results here in fact imply strengthenings of 
some of the theorems in Allman and Rhodes (2008a). Furthermore, since the 
general Markov model includes as submodels the general time-reversible models 
with fixed rate-matrices describing the substitution process across the tree, our 
results apply to the filtered versions of those models as well. 

Interestingly, one implication of this work and that of Steel et al. (1993) 
which seems not to have been widely noticed, is that the most basic exam- 
ple used to explain phylogcnetic inference to students is actually an example 
of an intractable problem. Assuming any model encompassing the Mfc model 
underlies the data, if we attempt to infer an unrooted four-leaf tree using only 
those characters that are parsimony-informative, then no method of inference 
can consistently identify the correct tree, even if given an infinite sample of 
characters. We establish that each of the three possible binary tree topologies 
can lead to all possible positive distributions of parsimony-informative patterns, 
thus strengthening the result of Steel et al. (1993). 

We emphasize that the non-identifiability in this case is not an argument 
for ignoring the ascertainment bias. If characters are filtered to contain only 
parsimony-informative patterns and the ascertainment bias is ignored then in- 
ference can be positively misleading in the sense that Fclsenstein (1978) used 
the phrase - the incorrect tree can be preferred with increasing support as the 
number of characters increases. Indeed, using standard software to perform a 
maximum likelihood analysis of filtered 4-taxon data under the misspecified Mfc 
model often results in the erroneous inference of a particular tree topology. 
While maximum likelihood inference under the correctly-specified Mfcpars-inf 
model does not prefer any tree topology, it will at least not lead to rejection of 
the true tree (except when some parsimony-informative patterns do not occur, 
due to sampling error). 

We also note that the Mfcv and Mfcpars-inf models may be appropriate in 
contexts outside of morphological systematics. For example, one (admittedly 
flawed) method for incorporating information from insertion/deletion events 
(indels) in a molecular sequence analysis is to code the absence or presence of 
a base as a 0/1 character. Because columns without indels are generally not 
coded, and columns in which all taxa lack a nucleotide are impossible to correctly 
code, such binary characters should be analyzed under a model that conditions 
on the variability of the characters. (More appropriate ways of modeling indels 
are discussed by Thorne et al. (1991) and Diallo et al. (2007).) In a similar vein, 
one of the models included in our analysis, GM2v, the variable-patterns-only 
version of the model GM2, has recently been used for a likelihood analysis of 
intron loss and gain by Csiiros et al. (2007). 

Finally, we emphasize that while establishing identifiability of parameters 
for a model is essential for its use in statistical inference, there are other im- 
portant issues that we do not address in this work. In particular, efficiency 
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concerns how many characters are needed for inference by a particular method 
such as maximum hkehhood to perform well, and robustness concerns how well 
the method performs on data deviating from the assumed model. Even for un- 
filtercd phylogenetic models these questions have mainly been investigated by 
simulation, rather than theoretically. 

2. Parsimony-informative models 

The models of sequence evolution we consider are submodels of the general 
Markov model, with observations restricted to variable or parsimony-informative 
patterns. In this section, we make this more precise. 

By an n-taxon tree T, we mean an unrooted, n-leaf, topological phylogenetic 
tree, with leaves labeled by the taxa a.i, i E [n] . We do not assume the tree T is 
binary; it need not be fully resolved. However, we do assume T has no internal 
nodes of valence 2. 

The /c-statc general Markov substitution model, GMfc, on T is parameter- 
ized as follows: First, arbitrarily choose some node of T to be the root. Des- 
ignating character states by elements of [k] = {l,2,...,/c} , a row vector 
TT = (ttJ G [0, 1]*^, with entries summing to 1, gives probabilities of each state 
i G [A;] occurring at the root. On each edge e of T, directed away from the 
root, a fc X fc Markov matrix Me, with rows summing to 1, gives conditional 
probabilities of each possible state change occurring on that edge. We refer to 
the entries of tt and the Afe as the numerical parameters of GMfc, in contrast 
to the tree parameter T, which is non- numerical. 

Throughout, we assume that 

(1) all entries of tt and the A/e arc strictly positive, and 

(2) all Me are non-singular. 

Condition (1) is a biologically natural one, implying that all states and all state 
transitions can occur. It also ensures that, the probability distribution arising 
for one choice of the root of T and numerical parameters is identical to one 
for any other choice of the root, with a corresponding appropriate choice of 
numerical parameters that are unique up to permutations of character states at 
internal nodes of T (Steel et al., 1994; AUman and Rhodes, 2003). This means 
that identifiability of numerical parameters for GMfc can only be claimed up to 
the arbitrary choices of the root and orderings of states at internal nodes. Condi- 
tion (2), which when restricted to continuous-time models is just the requirement 
that edge lengths be finite, is needed to avoid other sources of non-idcntifiability 
(such as a situation in which all terminal edges have infinite length, so that no 
information about internal tree structure is retained in the joint distribution). 

Following Lewis (2001), we use Mfc to denote the submodel of GMfc which as- 
sumes a uniform root distribution, tt = (1/fc, . . . , 1/fc), and that for each Markov 
matrix all off-diagonal entries are equal. Thus M4 is also known as the Jukes- 
Cantor (JC) model, while M2 is the Cavender-Farris-Neyman (CFN) model. 
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While Lewis (2001) presents a continuous-time formulation of this model, that 
is equivalent to the submodel of the one given here by making an additional 
assumption that off-diagonal matrix entries are smaller than diagonal entries. 
As our methods are primarily algebraic, we do not focus on the continuous-time 
formulation. 

For the GMfc (or Mfc) model on a fixed n-taxon tree T, the joint probability 
distribution of character states at the leaves of the T can be expressed by poly- 
nomial formulas in the entries of tt and the M^. Denote a pattern of states at 
the leaves of a tree by a vector i = (ii, 22, . . . , in) = «i»2 ■ ■ - in & [fc]", where the 
leaf labeled by taxon Oj displays state ij . We use pi to denote the probability 
of observing pattern i that arises from a specific model, tree, and numerical 
parameters. 

Wc wish to modify the above models to describe data that is collected only on 
parsimony-informative patterns. We will not explicitly treat a variablc-pattcrns- 
only model, as the necessary modifications are straightforward. 

Denote the set of parsimony-informative patterns by 

I = {{ = ziZ2«3 ■ ■ - in I iji ~ ij2 7^ ijs — ij4i foi' some distinct ji}. 

For fixed k, the total number of patterns grows exponentially with n, while the 
number of parsimony-noninformative ones grow only polynomially. Thus the 
cardinality of X grows exponentially with n. 

Suppose that from a total number of M independent, identically distributed 
characters described by the GMfc model, we may obtain only data counts ni for 
those patterns i G X. Since assumption (1) implies pi > for all i, we have that 



With N ~ X^iex^'i' ^^^^ total count of observed characters, then N < M and 
V{N > 0) ^ 1 as M ^ cx). 

If we were able to observe all patterns, including parsimony-noninformative 
ones, then observed pattern frequencies would be pi = ni/M, which, by the 
strong law of large numbers converges to pi almost surely as M ^ 00. However, 
since M is unknown from data, we cannot compute pi directly for i G X. We 
instead introduce the observed frequencies 



These are estimators for conditional probabilities, qi, that one observes i given 
that a parsimony-informative pattern is observed. Thus 



V{ni > for some i G X) ^ 1 as A/ ^ 00. 




Qi = V{i I pattern is parsimony-informative) = — , 



(1) 



where 



p ~ "P (pattern is parsimony-informative) 




Pi- 
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Note that cji — > qi almost surely as M ^ oo. 

We thus define a parameterized model, GMfcpars-inf, which gives values of 
the gi, i S X, as a function of the usual GMfc parameters. For any fixed tree, ex- 
plicit formulas for the qi as rational functions of the numerical model parameters 
are easily obtained. Restricting to any submodel of GMfc, we similarly obtain 
a parsimony-informative version of the submodel. For instance, Mfcpars-inf de- 
notes the model describing the restriction of observations of the Mfc model to 
parsimony- informative patterns. 

Similarly, one can define parameterized models GMfcv and Mfcv in which the 
non-constant patterns can be observed, by conditioning on the variableness of 
patterns rather than their parsimony- informativeness. 



3. Results 

As mentioned. Steel et al. (1993) showed that from parsimony-informative 
patterns alone the tree topology is not identifiable for the CFN {i.e., M2) model 
on a 4-taxon tree, at least for certain parameter choices. We begin by extend- 
ing this negative result to models with more character states, and to the full 
parameter space. 

Consider the model Mfcpars-inf on a 4-leaf tree ai 02 10304. Since there are 
3fc(fc — 1) parsimony-informative patterns for the fc-state model, a probability 
distribution arising from this model is represented by a vector of 3fc(fc — 1) 
probabilities. However, these vector entries are all the same for patterns of the 
same form (i.e, qxxyy is the same for all choices of distinct states x and y, etc.). 
Thus the distribution can be represented by a vector 

Q — {Q xxyy 1 Q xyyx 1 Q xyxy^ 1 

where Qxxyy = k{k - l)qxxyy, etc., so that 

Qxxyy ~t- Qxyxy ~t- Qxyyx — 1- 

In 3-space, Q lies on the part of the plane x + y + z — 1 in the non-negative 
octant. This set, the probability simplex A, is an equilateral triangular patch, 
with corners (1,0,0), (0,1,0), and (0,0,1). 

Theorem 1. The set of all probability distributions arising from the model 
Mkpars-inf with positive probabilities of a substitution on each edge of the binary 
tree 010210304 is precisely the interior of A. 

As the set of probability distributions described in the theorem is indepen- 
dent of the tree topology, we immediately obtain the following. 

Corollary 2. Suppose T is a 4-taxon tree. Then the topology of T is not 
identifiable for the model Mkpars-inf or for any fc > 2. 
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Corollary 3. Suppose data is generated by the Mkpars-inf or GMk^ars-mf model, 
on a 4-taxon tree with parameters resulting in a positive probability of observ- 
ing every parsimony-informative pattern. Then any method of inference of the 
tree topology either (a) always returns all three trees, or (b) can be positively 
misleading. 

The proof of Theorem 1 given in Appendix A uses exphcit calculations and 
topological arguments. 

Note that standard numerical maximum likelihood software generally infers 
a particular tree topology when 4-taxon data produced by the Mfcpars-inf model 
is analyzed under the misspecified Mfc model. Thus this model misspecification 
can lead to positively misleading inference. 

For larger trees, one might expect that omitting parsimony-noninformative 
data would result in little loss of information. To establish positive results 
on the identifiability of parameters for the models GMfcpars-inf and Mfcpars-inf, 
we focus on GMfcpars-inf, since results about it apply to its submodels. We 
separately address the identifiability of the tree topology and identifiability of 
the numerical model parameters, since the tree topology must be fixed before 
the numerical parameters are even meaningful. 

Theorem 4. Suppose n > 8. Then any n-taxon tree topology is identifiable for 
the GMkpars-inf model, and its submodels, such as Mkpars-inf- 

Note that we do not claim n = 8 is the minimal number of taxa ensuring 
identifiability for either GMfcpars-inf or Mfcpars-inf , either for all k or for any fixed 
choice. Our method of proof simply does not apply when n < 7. 

Since from a distribution for the GMfcv model one may compute that of 
the GMfcpars-inf model with the same parameters, we immediately obtain the 
following. 

Corollary 5. Suppose n > 8. Then any n-taxon tree topology is identifiable for 
the GMkv model, and its submodels, such as Mkv. 

The proof of Theorem 4 is given in Appendix B, and depends on the con- 
struction of phylogenetic invariants for GMfcpars-inf {i.e., polynomials that van- 
ish on any joint distribution of patterns for GMfcpars-inf arising from a fixed 
tree topology). These invariants are close in spirit to an encoding of the 
well-known 4-point condition of Buneman (1971), using the log-det distance 
(Cavender and Felsenstein, 1987; Steel, 1994), but the restriction to parsimony- 
informative patterns introduces complications. 

Assuming the tree topology is already known, we next consider the identifia- 
bility of numerical parameters. Although our result on identifiability of the tree 
topology required at least 8 taxa, fewer taxa suffice for our remaining arguments. 

For small trees, though, there are certainly instances of non-identifiability. 
For instance, in the 4-taxon case, for either of the models GM2pars-inf or M2pars-inf , 
we cannot have identifiability of numerical parameters. The easiest way to see 
this is a dimension count: There are only 6 parsimony- informative patterns for 
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GM2pars-inf On a 4-taxon tree, yet the model has 11 free numerical parameters. 
However, the continuous parameterization of the model cannot injectively map 
any full-dimensional subset of M}^ into a 5-dimensional subspace of M®. In fact, 
any distribution arising from the model must arise from infinitely many choices 
of parameters. Similarly, the model M2pars-inf on a 4-taxon tree has 5 free nu- 
merical parameters, but up to symmetry there are only 3 parsimony-informative 
patterns. 

In Appendix C we give the outline of the proof of the following, though the 
work of Appendix E is needed to complete the argument. 

Theorem 6. Suppose n > 7 and T is a known n-taxon tree. Then numerical 
parameters of the model GMkpars-inf, and its submodels, such as Mkpars-inf, on 
T are identifiable, up to choice of a root for T and permutation of the states at 
the internal nodes of T . 

The issue of identifiability of GM parameters only up to a permutation of 
states at internal nodes of T is a well-known one (Allman and Rhodes, 2003), 
arising because the joint distribution gives no information on which hidden 
state is which. Chang (1996) removed this ambiguity through a biologically- 
motivated assumption that all Markov matrices have their largest entries in each 
row appearing on the diagonal. As permuting the states at internal nodes has the 
effect of reordering the rows and columns of the Markov matrices, the highly- 
structured pattern of entries in the Markov matrices for M/c enables one to 
remove the ambiguity even without Chang's assumption. Thus the identifiability 
of numerical parameters for the M/cpars-inf model is, in fact, complete. 

For trees with fewer than 7 taxa, we obtain a slightly weaker result on 
identifiability of numerical parameters, as stated and proved in Appendix D. 
Although that result is perhaps of less interest for biological application, we 
include it as it provides a good introduction to the method of proof of Theorem 
6. These proofs again depend on phylogenetic invariants, but invariants not 
for the model GM/c pars-inf, but rather for GMfc (Allman and Rhodes, 2008b, 
2007). These invariants lead to algebraic formulas for determining the values of 
Pi for all i G [fc]" from the values of qi for those i G X. Then the identifiability 
of parameters for the GMfc model established by Chang (1996) completes the 
proof. 

An interesting aspect of the work in Appendix D is that our arguments for the 
M2pars-inf modcl ou 5-taxon trees establish parameter identifiability for generic 
parameter choices, but fail under a molecular clock assumption. Thus what 
one might consider the simplest assumption actually leads to a more difficult 
mathematical analysis, due to the symmetries inherent in it. 
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A. Non-identifiability of 4-taxon trees 

Our proof of Theorem 1 will require the notion of the the fundamental group 
of a space, from algebraic topology. As this docs not commonly appear in the 
phylogenetics literature, Masscy (1991) provides a good development for those 
unfamiliar with it. The arguments in this appendix have little in common with 
those of the rest of the paper, so readers interested primarily in other results 
may elect to move on to Appendix B. 

Recall A denotes the 2-dimensional probability simplex, as defined in Section 
3. To simplify some formulas, it will be convenient to represent a vector Q = 
{Qi,Q2,Q-i) G A by homogeneous coordinates [(9i,(92,Q3] which are not all 
zero and are determined only up to rescaling by a non-zero constant. That is, 
[Qi,Q2,<33] = [AQi, A(32, AQa] for any A 7^ 0. Thus [Q'l, Q'2, Q'z] represents Q 
with Q, = Q'J{Q'^+Q'2 + Q'i). 

Associate to each of the five edges of the tree 010210304 a parameter giv- 
ing the probability of a substitution occurring on that edge, with Si, 32,33,54 
denoting the parameters on pendant edges leading to taxa oi, 02, 03, 04, respec- 
tively, and S5 the parameter on the central edge. Thus the Markov matrix Mj 
has diagonal entries 1 — Si and off-diagonal entries Si/{k — 1). We focus on the 
subset of the parameter space defined by 

S = {(si, S2, S3, S4, S5) I Si G (0, 1 - l/k)} , 

which corresponds to finite, positive edge lengths. However, for technical reasons 
we will also need to consider the extension of the parameterization to the larger 
set 

S' ~ {(si, 32, S3, S4, S5) I Si G [0, 1 — l/Zc) , and either S5 > or two s^ > 0} . 
We let 

: S" A 

denote the (extended) parameterization map giving Q as a function of the 5 
edge probabilities. 

For any e > 0, let = {Q G A | min Qi < e} denote an open neighborhood 
in A of 9A, the boundary of A. Wc also use dA to denote a loop, starting 
and ending at [1,0,0], parameterizing dA in the counterclockwise direction in 
Figure 1. 

Lemma 7. For any e > 0, there exists a loop 7 in S' such that (p o ^ is a loop 
in D^, starting and ending at [1,0,0], that is homotopic in to dA. 



11 



Figure 1; The two-dimensional probability simplex A, with vertices [1,0,0] (bottom left), 
[0,1,0] (bottom right), and [0,0,1] (top). The three curves form the loop 7 constructed 
in Lemma 7, for fc = 4 and S = 0.3, with the image of o 71 in red, o 72 in blue, and <}!> o 73 
in green. For smaller 5, the loop would be closer to the boundary of A. 



Proof. Wc construct tlic loop 7 (see Figure 1), in three parts, witli 71 chosen 
so (/) o 71 is a path from [1, 0, 0] to [0, 1, 0] which is near the edge of A joining 
those points, 72 chosen so o 72 is a path from [0,1,0] to [0,0,1] which is 
near the edge of A joining those points, and 73 chosen so o 73 is a path 
from [0,0,1] to [1,0,0] which is near the edge of A joining those points. As 
the construction requires some expHcit elementary, but quite long, calculations, 
we provide these in a worksheet file for the computer algebra software IVIaple, 
available as supplementary material on our website (AUman et al., 2009). 

One can give an explicit formula for cf) (using, for instance, equations (1),(2),(3) 
of Schulmeister (2004)), and check that 

(j){Q, 0, 0, S4, 55) = [1, 0, 0], for .S4, .S5 G (0, 1 - 1/fc), 
0(si,O,O,S4,O) = [0,1,0], for.si,.54 £ (0,1 -1/A:), 
0(O,S2,O,S4,O) = [0,0,1], for .S2,S4 e (0,1 - l/k). 



For small S > 0, and for t £ [0, 1], let 
71 W ^ 



2St 1 Sil - 1) 

1 ^7 ^5 O ' 



i + 25r 



3' l + 5{l-t)J ' 

so 7i(0) = ^0, 0, 0, i, and 71(1) = 0, i, o) , and one computes 



o 71 (t) 



1+25 ' 3 

t {l-t)t5 



fc - 1 (fc - 1) 



Note (/) 071(0) = [1,0,0], 071(1) [0, 1,0], and there exists a, 61 > such that 
for all < 5 < (5i the image of o 71 lies in D^. 
Next, let 

/ 2(5(1 - t) 26t 1 ^ 
" \l + 25{l-tyYT25t' 'i' 
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so 72(0) = (i||5,0,0,i,o) and 72(1) = (o, t||^ , 0, |, o) . Then it can be shown 
that 

0o72(t) = [Ail-t)tS,l-t,t], 

so (/)O72(0) = [0, 1, 0] and (/)072(1) ~ [0, 0, 1]. Furthermore, there exists a, 62 > 
so that for all < 5 < 82, the image of o 72 Ues in D^- 

The third segment of the path is defined similarly to the first, with 

, , / 26(1 -t) 1 St 



l + 25(l-t)' '3' l + (5t^ 
so 73(0) = (0, y||j,0, i,o) and 73(1) (^0,0,0, i,^!^). One checks that 

{l-t)dt l-t 



</> 073(0 = 



(fc-l)2'A:-l 



Then for some (53 > 0, if (5 < (53 then (/> o 73 is a path in from [0, 0, 1] to 
[1,0,0]. 

Finally, for any 5 < min((5i, 62, S3) a loop with the desired properties is given 
by traversing these paths consecutively, by 7 = 71 * 72 * 73 . □ 

We next obtain a similar result for the parameter space of interest, S. 

Lemma 8. For any e > 0, there exists a loop ^ in S such that the loop (j)oj is 
in and homotopic in to dA. 

Proof. By Lemma 7, there is a loop 7' in S' such that (poj' is a loop in that is 
homotopic to 9 A in D^. Since (j)~^{D^) is open in 5" and contains the compact 
set im(7'), there exists some 5' > such that if s G 5 and dist(s, im(7')) < 5' , 
then (j){s) G De. Thus for sufficiently small (5 > 0, the loop defined by 7(t) = 
7'(t) +(5(1,1,1,1,1) is in S and (/> o 7 has image in D^. Since 7 is homotopic to 
7' in (j)^^{Df), then (/> o 7 is homotopic to 9 A in □ 

Proof of Theorem 1. It is clear that parameters in S lead to positive probabili- 
ties of each parsimony-informative pattern, so (j){S) C A \ 9A. 

Let P e A \ (9A, and suppose P ^ (/)(5). Choose e > so P ^ £)<:, and 
let 7 : [0, 1] — > 5 be a loop whose existence is asserted by Lemma 8. Since a 
parameterization of 9A is non-trivial in the fundamental group 7ri(A \ {P}), 
(/) o 7 is non-trivial in that fundamental group as well. 

However, since S is contractible, there is a homotopy g deforming 7 to a 
constant map. Then h = (j)og is a homotopy in A deforming (/)0 7 to a constant 
map. Since P ^ (p{S), this is actually a homotopy in A \ {P}. Thus (j> o j is 
trivial in the fundamental group. This contradiction shows P G (t>{S). 

Thus (l){S) = A \ 9A. □ 
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B. Identifiability of larger trees 

Our argument establishing Theorem 4 is at some level similar to ones es- 
tablishing tree identifiability for more standard models using the existence of a 
phylogenetic distance. However, because of the filtered nature of the model, we 
cannot easily define a distance directly. Instead, we construct certain phyloge- 
netic invariants that can distinguish tree topologies. While these invariants are 
motivated by a statement of the 4-point condition for the log-det distance, the 
details of the construction are much more involved. 

For proving both Theorem 4 and subsequent results, it will be convenient 
to use the following notation. Suppose for some choice of parameters for the 
model GMk on an n-taxon tree the resulting distribution of patterns is given 
by {Pi}ie[fc]'> • Then let P denote the k x ■ ■ ■ x k n-dimensional array whose 
entries are P(ii,...i„) = pi-^...i^. Similarly, for the same parameters for the 
model GM/cpars-inf , suppose the resulting distribution of parsimony-informative 
patterns is given by {qi\iei- Then let Q denote a kx ■ ■ ■ xk n-dimensional array 
whose entries are Q{ii, ■ ■ - in) = 9n - i„ for ii • ■ • i„ € X, and are undefined for 
ii • ■ • i„ ^ X. (In this section, we will avoid reference to any undefined entries of 
Q, but in subsequent sections we will give meaning to them.) 

Definition. Suppose S is some subset of the taxa {oi, . . . , a„}. Then for any 
pattern i G [fc]", let proj5(i) denote the vector in [fc]'"^' of only those components 
ij of i with aj G S. Thus projg(i) is the subpattern of i of states at the taxa in 
S. 

Proof of Theorem 4. By Theorem 6.3.5 of Semple and Steel (2003), it is enough 
to show we can identify the topology of the induced subtree for every quartet 
of taxa. Without loss of generality, we may focus on identifying the topological 
tree relating ai, 02, 03, a4. We may also assume the tree is rooted at the node 
of our choice: the node where paths leading from oi, 02, and 03 join. 

Let S ~ {05,..., a„}. Choose and fix any pattern Iq = (15, zg, . . . , ««) G 
[A:]""'* of states for taxa in S that is parsimony-informative for S. (This requires 
that n > 8.) Consider the 4-dimensional array Qo whose entries are all qi such 
that proj5(i) = io. This is a 4-dimensional 'slice' of the array Q in which only 
the states at taxa 01,02,03,04 vary. However, Qo has no undefined entries, as 
all its entries arise from patterns in I. 

Next we apply the essential idea behind the log-det distance on 4-taxon trees, 
but modify it to deal with the array Qo- Our argument is similar to that of 
Steel (1994), but new details require a full presentation. 

Suppose the true quartet tree relating oi, 02, 03, 04 displays the split 010210304. 
Then to each of the 4 (in the unresolved case) or 5 edges e of the quartet tree, 
we associate a matrix TVg in the following way: 

Any edge e in the quartet tree corresponds to a path ei, 62, . . . , in the full 
tree T, possibly with branches leading off toward some of the o^ with z > 5, as 
illustrated by the representative cartoons of Figure 2. 
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Figure 2: Edges e (shown in bold) in the induced quartet tree correspond to paths ei, 62, . . . , fir 
in the full tree T. The lines leading off of e represent the subtrees relating some subcoUection 
of taxa ai, with i > 5, which are attached in T to nodes along the path. The common root of 
T and the quartet tree is marked with a large dot. 

Consider first a binary tree T. Each subtree of T coming off the path at the 
node at the end of an contains leaves labeled by taxa in a set Si C S. To 
this subtree, associate a vector e (0, 1)''" giving the conditional probabilities 
that each of the states at this node produces the pattern projg. (io). While 
polynomial formulas could be given for these vectors in terms of entries of the 
Markov matrix parameters, we do not need explicit expressions, so we omit 
them. Now to an edge e in the quartet tree associate the matrix 

- diag(vi)A/e, diag(v2)Me3 • • • diag(v,._i)Afe,, (2) 

where the Mg. are the Markov matrix parameters on T. Thus iVg gives proba- 
bilities of changes to all states at the end of e and to proj^. (io) at the taxa in 
Si conditioned on the state at the start of e. 

If T is not a binary tree, this expression for is not yet well defined. 
By specifying that subtrees attached to internal nodes of the quartet tree are 
considered to be attached to specific pendant quartet tree edges, we remove 
some ambiguity, though the expression for Ne for pendant edges may now begin 
with one or more diagonal matrices, rather than an M,,. We also must allow 
more than one adjacent diagonal matrix factor in the expression for TVg given in 
equation (2) due to multifurcations in T along e. In case the quartet tree is also 
not binary, we may for convenience consider a resolved quartet tree and assume 
the product associated to the internal quartet edge is empty, with Ne = I- Note 
that by our assumption that all Mg- have all positive entries, the non-binary 
quartet tree is the only case in which any = /, and otherwise all entries of 
Ng are positive. 

In all cases, our hypotheses ensure is non-singular. 

Now for the quartet tree associated to the split 010210304, let Ni, i — 1, 2, 3, 4 
be the four such matrices associated to the edges leading to the leaves, and 
the matrix associated to the interior edge, as described above. Redefine the sets 
Si ^ S to be the set of taxa o^, i > 5, which are in subtrees of T coming off 
of each of those five quartet edges. The entries of the matrices iV^ then give 
conditional probabilities, conditioned on the state at the start of the quartet 
edge, of observing each state at the end of the quartet edge and also observing 
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projs.(io). Although their entries are probabiHties, the Ni are typically not 
Markov matrices, as entries in each row add to 1 only when Si = 0. 

For i — 1,2, 3, 4, let = Nil where 1 is the column vector with all entries 
1. The entries of w^, therefore, give the probabilities of observing proj5.(io), 
conditioned on the state at the start of the pendant quartet edge, since we are 
simply marginalizing A'j over the index corresponding to a^. 

Let W34 be the column vector of probabilities of observing proj^^y^^yg^ (ip) 
conditioned on the state at the root, so 

W34 = diag(w3) diag(w4)l ~ diag(w3)w4 = A^5 diag(w4)w3. 

Let W12 be the vector of probabilities of observing proj5ju52US5(io), conditioned 
on the state at the node where the quartet edges leading to taxa 03,04 join. 
Using Bayes' formula to 'reroot' the quartet tree at the second internal node, 
we similarly find 

W12 = diag(7rA^5)~"'^iVj diag(7r) diag(wi)w2 
= diag(7rA^5)^"'A^J diag(7r) diag(w2)wi. 

Under our hypotheses, all entries of every and Wij are positive, as there is 
a positive conditional probability of every state change occurring on every edge 
of the full tree. 

We now have the following matrix formulas expressing 2-dimensional marginal- 
izations of Qq in terms of model parameters: 





+) := X] Qoi-i-ihj) 


= iVf diag(7r) diag(vkf34)iV2, 


<3o(-, +, •, 


+) := X! <3o(-,«,-, j) 

■ije[fe] 


= A^f diag(7r) diag(vkf2)A^5 diag(w4)A^3, 


Qo(-,+,4 


ije[fc] 


= N'( diag(7r) diag(w2)A^5 diag(vir3)A^4, 


Qo(+, ■, ■, 


+) := X! <3o(*,-,-,j) 

■ije[fc] 


= N2 diag(7r) diag(wi)A^5 diag(w4)Af3, 


(3o(+,-,4 


ije[fc] 


— N2 diag(7r) diag(vifi)A^5 diag(vif3)A^4, 


Qo(+,+, 


ije[fc] 


= diag(7rA^5) diag(viri2)A'^4. 



These imply 



det(go(-, +, •, +)) dct(Qo(+, •, +, •)) - dct(go(-, +, +, •)) det(Qo(+, •, ■, +)) = 0. 

(3) 

As the left hand side of this equation is a polynomial in the qi, i G X, it is a 
phylogenetic invariant for the model GMfcpars-inf • It is analogous the the 4-point 
distance identity (i(ai, 03) + (i(a2, 04) = (i(ai, 04) +(i(a2, 03), and it must vanish 
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on any distribution arising from GMfcpars-inf in which the induced quartet tree 
on the first four taxa displays the spht 010210304. Two invariants similar to that 
of equation (3) can be constructed that will vanish if the quartet tree displays 
the other possible splits. For the split oi03|o2a4 we have 

det(go(-, •, +, +)) det(Qo(+, +, •, •)) - det(go(-, +, +, •)) det(Qo(+, •, •, +)) = 0, 

(4) 

and for the split Oia4|o203 

det(go(-, •, +, +)) det(Qo(+, +, •, •)) - det(go(-, +, •, +)) dct(go(+, ■, +, •)) «• 

(5) 

To show that we can use these invariants to identify tree topologies, we need 
only establish strict inequalities analogous to the distance inequality d(oi, 02) + 
^(03,04) < ^(01,03) + (i(a2,04) which holds provided the central edge of a 
quartet tree displaying 010210304 has non-zero length. Doing so would imply 
that for the fully resolved quartet tree exactly one of the three equations (3), 
(4), and (5) can hold. As the formula for the log-det distance involves a minus 
sign, we reverse the inequality and, assuming ^ /, so all entries of are 
positive, we seek to show 

det(go(-, •, +, +)) det(go(+, +, •, •) > det(go(-, +, •, +)) det(go(+, •, +, ■))• 
By the expressions for the marginalizations above, this is equivalent to 

dct{N'[ diag(7r) diag(vif34)iV2) dct(^J diag(7r7V5) diag(viri2)iV4) > 
det{N'[ diag(7r) diag(w2)A^5 diag(w4)A^3) x 

det(A^J diag(7r) diag(wi)A^5 diag(v^f3)A'^4), 

or, since the Ni and diag(7r) arc non-singular, 

dct(diag(7rA^5)) det(diag(viri2)) dct(diag(w34)) > 

dct{Nr,)'^ dct(diag(7r) diag(vifi) diag(w2) diag(vif3) diag(w4)), 

or, using the above expressions for the w^, 

det(diag(7rA^5)) det(diag(diag(7rA^5)^^ A^J diag(7r) diag(viri)w2)) x 
det(diag(iV5 diag(vkf3)w4)) > 
det(A^5)^ det(diag(7r) diag(wi) diag(w2) diag(v^r3) diag(w4)). (6) 

To establish inequality (6) we will use the following: 

Lemma 9. Suppose A is anxn matrix with positive entries, and the row vector 
X G K" has positive entries. Then 

det(diag(xA)) > | dot A] dct(diag(x)), 

and 

dct(diag(Ax^)) > | dot A| dct(diag(x)). 
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Proof. We prove the 2x2 case here as an ihustration. The general proof can 
be extracted from Steel (1994). 

With A ~ ^ , X = {x,y), since a,b,c,d,x,y > 0, the first inequality 

follows from 

(ax + cy){bx + dy) > adxy + bcxy > \ad — bc\xy. 
The second inequality follows from applying the first to the transpose of A. □ 

Now to establish inequality (6), by applying Lemma 9 twice, it is enough to 
show 

det(diag(7rAr5)) . | det(diag(7rA^5)"^7Vj diag(7r) diag(wi))| det(diag(w2))- 
I det{N5 diag(w3))| det(diag(w4)) > 
det(A^5)^ det(diag(7r) diag(wi) diag(w2) diag(w3) diag(w4)). 

After canceling many non-zero determinants appearing on both sides of this 
inequality, we see it simply states that 1 > 1. □ 



C. Identifiability of numerical parameters 

The full proof of Theorem 6, on identifiability of numerical model param- 
eters, depends upon a key technical lemma. This lemma requires extensive 
arguments that are deferred to Appendix E. To motivate the lemma, and make 
the flow of the larger argument clearer, we first give the proof of the Theorem 
assuming that lemma is known. 

Proof of Theorem 6. For i G I, has been defined in equation (1), as the con- 
ditional probability of observing pattern i given that a parsimony-informative 
pattern is observed. For mathematical convenience, we extend the definition 
of qi by the formula in equation (1) to all i, but do not give a probabilistic 
interpretation to its meaning for i ^ T. We emphasize that the denominator in 
this definition remains a sum only over i S X. 

In Appendix E, Lemma 19 will show that from the qi with i e X arising from 
the GMfcpars-inf modcl on a known tree of at least 7 taxa, we may determine 
all q\ with 1 ^ X. As motivating and proving this lemma requires an extended 
exposition, we simply assume the result for now. 

By equation (1) we know that for i G [A;]" the pi can be obtained from the 
Qi by rescaling by the (unknown) factor p = X^iei-Pi- Since ~ 
however, we may determine p by the formula p = l/(X]iG[fc]" '^0- Thus we can 
determine all pi from all qi. 

Finally, with all pi known, we can apply the identifiability result of Chang 
(1996) on the GMfc model to complete the argument. Chang's formulation actu- 
ally requires additional assumptions on the GMfc model parameters ('diagonal 
dominance in rows') which enable one to determine the ordering of the rows 
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and columns of each Markov matrix parameter. As we have not made such an 
assumption, we note his argument shows the parameters are only determined 
up to permutations of states at the internal nodes of the tree. □ 



As this proof outline indicates, the major step is in establishing Lemma 19. 
Although not logically necessary, to motivate the proof of that lemma, wc first 
investigate the 5-taxon tree case for the model GM2pars-inf in the next section. 
Complications will arise, due to the possibility that certain expressions may be 
zero. That will lead us to first establish identifiability for generic parameters 
in the 5-taxon case, and then investigate whether exceptional non-identifiable 
choices of parameters may exist. 

D. Identifiability of numerical parameters: the 5-taxon, GM2pars-inf 



Following the proof of Theorem 6, to establish identifiability of numerical 
parameters for the GM2pars-inf model on a 5-taxon tree, it would be enough to 
show the Qi for i e X determine those for i ^ T. Although we will see this is not 
true in complete generality, investigating the conditions under which it is true 
will raise some interesting further questions, as well as point the way toward 
Lemma 19. 

We need the following result, a special case of a more general theorem 
proved by AUman and Rhodes (2008b). (For a more expository presentation, 
see AUman and Rhodes (2007).) 

Theorem 10. For the GM2 model on a 5-taxon binary tree as shown in Figure 
3, let {0, 1} denote the set of character states. Let Piii2i3iii5 denote the joint 
probability of observing state ij in the sequence at leaf Oj, j = 1, . . . , 5. Then the 



ideal of phylogenetic invariants for this model are generated by the 3x3 minors 
of the following two matrices: 



case 




Figure 



3: A 5-taxon tree 
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and 



F2 



''pooooo 

POOlOO 

Poiooo 

POllOO 
PlOOOO 
PlOlOO 
PllOOO 
\PlllOO 



POOOOl 

Pooioi 

POlOOl 

Poiioi 

PlOOOl 
PlOlOl 
PllOOl 
PlllOl 



POOOlO 

Pooiio 

POlOlO 

Poiiio 

PlOOlO 
PlOllO 
PllOlO 
PllllO 



Poooii^ 

POOlll 

Poioii 
Pawn 

PlOOll 
PlOlll 
PllOll 

Plllll/ 



A few comments may make this theorem clearer. The matrices Fi and 
F2 are the two natural 2-dimensional 'flattenings' of the 5-dimensional joint 
distribution array according to the splits corresponding to the two internal edges 
of the tree. The splits, arc {{ai, a2}, {03, 04, 05}}, and {{ai, 02, 03}, {04, 05}}, 
and the indices of the matrix entries are such that states are held constant in 
one of these sets as one moves across rows or down columns. 

Recall that a 3 x 3 minor of a matrix is defined as the determinant of a 3 x 3 
submatrix obtained by deleting all but 3 rows and all but 3 columns. Thus each 
of these matrices has 4(g) = 224 such minors. Saying these 448 polynomials 
are phylogenetic invariants means that they evaluate to on any distribution 
arising from the model. We view each of these polynomials as specifying an 
algebraic relationships between the various p\. 

Of course these relationships imply algebraic relationships between the as 



Corollary 11. Every 3x3 minor of the two matrices Fi,F2 obtained from 
Fi , F2 by replacing all p\ by q\ equals zero, if the qi arise from the GM2 model 
on the 5-taxon tree. 

Proof. Since the matrices with entries arc simply rescalings of those with 
entries pi, this follows from the fact that determinants arc homogeneous poly- 
nomials. □ 

Thus we know many algebraic relationships between the qi. We now exploit 
these to determine the gi, i ^ T from the gi, i G X. 

Consider first the matrix Fi, where we use an underscore, as in 'gi', to 
highlight those entries where i ^ X (i.e., the entries we wish to determine). 



well. 




/ gOOOOO gOOOOl gOOOlO 900011 goo 100 gooioi gooiio gOOlllX 

goiooo gOlOOl gOlOlO gOlOll gOllOO goiioi gOlllO gOllll 

glOOOO glOOOl glOOlO glOOll glOlOO gioioi glOllO glOlll 

\gllOOO gllOOl gllOlO gllOll glllOO glllOl gllllO glllll / 



Focusing on the minor using rows 2,3,4 and columns 2,3,4, wc find 



goiooi goioio goioii 
gioooi giooio giooii 

gllOOl gllOlO gllOll 



0. 
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Expanding the determinant in cofactors by the last column we have 



901011 



910001 910010 
911001 911010 



910011 



901001 901010 
911001 911010 



911011 



901001 901010 
910001 910010 



= 0. 



Thus, provided 



901001 901010 
910001 910010 



^0, 



we can express 911011 in terms of only qi with i G X. Assuming the non- vanishing 
of this 2x2 minor, then, we see gnoii is determined by the qi for i G I. More 
generally, as long as any one of the three 2x2 minors built from rows 2,3 and 
two of the columns 2,3,5 are non-zero, a similar argument shows 911011, 911101, 
and 911110 can all be determined. Note that the non-vanishing of at least one of 
these minors is equivalent to the condition that the {2, 3}-{2, 3, 5} submatrix 



— ( 901001 901010 901100 \ 
\9ioooi 910010 910100/ 



has rank 2. 

We similarly see that provided the {2, 3}-{4, 6, 7} submatrix 



901011 
910011 



901101 
910101 



901110 
910110 



has rank 2, then 900001, 9oooio, and 900100 are also determined. 
Wc now consider the other matrix, 



/ 900000 


900001 


900010 


90001 1\ 


900100 


900101 


900110 


900111 


901000 


901001 


901010 


901011 


901100 


901101 


901110 


901111 


910000 


910001 


910010 


910011 


910100 


910101 


910110 


910111 


911000 


911001 


911010 


911011 


\9iiioo 


911101 


911110 


911111/ 



Provided its {2, 3, 5}-{2, 3} and {4, 6, 7}-{2, 3} submatriccs 

(900101 9ooiio\ /901101 901110' 

901001 901010 I and L4 = I 910101 910110 
910001 910010/ \9iiooi 9iioioy 

also have rank 2 we similarly can determine 900000 , 9oiooo , 910000 , 910111 , 9oiiii , 
and 911111. Note that for the determination of 900000 and 911111 we need values 
of some of the m that have already been determined. 
We've thus established 
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Lemma 12. Provided all 4 of the matrices 



— [ 901001 901010 901100 \ ]^ — 1 901011 9oiioi (Zoiiio 
\9ioooi 910010 910100/ ' \9iooii 9ioioi 9ioiiOy 

(900101 9ooiio\ / 901101 9oiiio 

901001 901010 I ^4 I 910101 9ioiio 
910001 910010/ \9iiooi 9iioioy 

have rank 2, then the qi, i £ X determine all qi, i G [fc]"- 



Combined with the argument hke that for Theorem 6, this lemma may be 
used to quickly establish that numerical parameters are generically identifiable 
for both the GM2pars-inf and M2pars-inf models on 5-taxon trees. Generic identi- 
fiability means that the subset of parameter space for which identifiability may 
not hold is of measure zero within the full parameter space. By Lemma 12, 
numerical parameter identifiability may fail only when at least one of the four 
matrices has rank < 2, a condition which can be equivalcntly phrased in terms of 
the vanishing of a finite set of polynomials in the 9i, obtained as certain products 
of 2 X 2 minors of the Li. Composing these polynomials with the polynomial 
parameterization map for the model, wc find the set of all non-identifiable pa- 
rameter choices lies within the zero set of a finite set of polynomials, i.e., it 
lies within an algebraic variety. Exhibiting a single choice of parameters for 
which these matrices all have rank 2, then, will establish that this is a proper 
subvariety of parameter space, and hence is of lower dimension than the full 
parameter space, with Lebesgue measure zero. Though we omit presenting such 
an example here, it is easy to choose rational parameter values and calculate 
with exact arithmetic to establish that such examples exist. 

We next investigate for what parameters any of the matrices Li of Lemma 
12 has rank < 2. This will establish generic identifiability in another way, by 
giving an explicit characterization of those parameters for which identifiability 
might not hold. Although our analysis will not give complete understanding 
of all cases, we show that while generic parameters are identifiable, there are 
indeed cases of GM2pars-inf parameters that are not identifiable. 

Consider first the submatrix 

— ( 901001 901010 9oiiooA 
\9ioooi 910010 910100/ 

and root the tree at the internal node closest to ai and 02 in Figure 3. We use 
Mi for the Markov matrix on the terminal edge to a^, 71 /g for the Markov matrix 
on the internal edge leading from the root, and M7 for the Markov matrix on 
the other internal edge. Let 

^ /Mi(0,0)M2(0,l) A/i(0,l)A/2(0,0)\ 
^1 \Ah{l,Q)M2{l,l) A/i(l,l)Af2(l,0)y ' 
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and 

Then 
where 



C2 



A/4(0, 0)A/5(0, 1) A/4(0, l)Af5(0, 0) 
A/4(1,0)M5(1,1) A/4(1,1)M5(1,0) 



Li = Cf diag(7r)A/6A, (7) 



Di = (bi b2 ba) , 
is a 2 X 3 matrix with columns b,; given by 

(bi bs) =diag(Af3(-,0))A//7C2 

and 

, . iNN,.r /A//4(0,0)Af5(0,0) 

^3 = ^^^g(^'^3(-'l))^^^Uf4(l,0)A/5(l,0) 

(Here M{-, i) denotes the zth column of Af .) 

Thus the first two columns of Li arc given by 

diag(7r)A/6 diag(Af3(-, 0))MrC2. 

Note all matrices in this product have rank 2 except possibly the Ci. Thus if 
both Ci have rank 2, so docs Li. 

A similar argument applies to the other Li, yielding the following explicit 
statement of generic identifiability 

Theorem 13. The model GM2pars-inf has identifiable numerical parameters for 
all parameter values such that both Ci and C2 have rank 2. 

We now investigate under what circumstances the Ci fail to be of rank 2. 
With 



0-2 1 - 02 / ' V ^2 1 - 62 



where < a; , 6^ < 1 , 

Ci 



(l-ai)6i ai(l-6i) 
02(1-62) (1-02)62 



Thus det Ci = means (1 — ai)(l — 02)6162 — 0102(1 — 6i)(l — 62), so 

0102 _ 6162 

(l-ai)(l-02) " (l-6i)(l-62)" 

Letting ai = ^-nd (3i — jz^, then < Ui^Pi < 00 and these are in 1-1 

correspondence with ai,bi. We now have 

Lemma 14. The matrix Ci has rank 1 if, and only if, aia2 = PiP2- 
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Thus to find examples where Ci has rank 1 we may pick Mi (equivalently 
ai, a2, or oi, 02) arbitrarily, and then have only one free parameter to pick M2 
(equivalently, we may pick f3i or 61, and then (32 and 62 are determined). 

If we avoid such 'bad' parameter choices for both the Markov matrices on 
the cherry of taxa 1 and 2 and the Markov matrices on the cherry of taxa 4 and 
5, then GM2pars-inf has identifiable parameters. 

Corollary 15. Numerical parameters of the model GM2pars-inf on the 5-taxon 
tree are identifiable except possibly on a codimension 1 algebraic subvariety of 
parameter space. This subvariety is the union of 2 irreducible varieties, one is 
explicitly characterized by the condition of Lemma I4 on the Markov matrices 
Ml, Af2, and the other by a similar condition on M^^M^. 

We next investigate whether identifiability actually fails for the parameter 
choices indicated in the corollary, or if it is only our proof that fails. 

Consider the extreme case where Mi, M2, M4, M5 have been chosen so that 
both Ci and C2 have rank 1. Then from an expression similar to equation (7), 
the fact that Ci has rank 1 implies that the middle two rows of the matrix Fi , 
and hence of Fi, must be dependent. Thus if wc knew the second row of Fi, 
and one of the entries in the third row, we could determine the rest of the third 
row. Similar comments apply to the middle two columns of F2, using that C2 
is of rank 1. 

This observation shows that if wc project from the 20 coordinates {qi}\^i to 
the 12 coordinates shown in the array 

^ — — — 900011 — 900101 * QooiiiX 

^ 901001 901010 901011 9oiioo 9oiioi * ^ 

— 910001 * * * * * — 

\9iiooo 911001 * ~ 911100 ^ ^ ^ / 

obtained by deleting entries in Fi, then this projection will be injective on 
distributions arising from GM2pars-inf parameters for which both Ci and C2 
have rank 1. In the above array '— ' marks parsimony-noninformative entries, 
and '*' parsimony-informative ones that can be inferred from other entries shown 
under the assumption that Ci and C2 have rank 1. To establish that GM2pars-inf 
is not identifiable for all parameters, it is thus enough to argue that if we know 
Ci and C2 have rank 1, identifiability of parameters is impossible from these 12 
coordinates. 

Note that the restricted parameter space for the GM2pars-inf model where 
Ci, C2 have rank 1 has dimension 13: the sum of 2-2 — 1 = 3 parameters for each 
cherry, 2 parameters for each of the 3 other edges, and 1 parameter for the root 
distribution. Thus each 13-dimensional neighborhood of a point in the interior 
of the restricted parameter space has an image that is of dimension at most 12. 
Thus the parameters cannot be identifiable, as the map is infinite-to-one. 

Proposition 16. There exist distributions arising from the GM2pars-inf model 
on a 5-taxon tree with infinite fiber under the parameterization map. That is, 
infinitely many choices of parameters can lead to the same distribution. 
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We now use our earlier theorems, which have all concerned the model GM2, 
to deduce results on the model M2pars-inf- 

To specialize Corollary 15 to M2pars-inf, note that the condition of Lemma 
14 simplifies to Afi = M2 for this model. Thus we obtain the following. 

Corollary 17. For the M2pars-inf model on the 5-taxon tree, suppose Mi ^ M2 
and A/4 7^ Mq. Then the numerical parameters are identifiable. 

Rather interestingly, in the case of a molecular clock assumption, with a 
root located anywhere on the tree, the potential bad cases in the statement 
above. Mi = M2 or M4 = M5, actually arise. It is an open question whether 
identifiability actually fails for M2pars-inf in such cases. This underscores that 
what may appear to be the simplest biological assumptions may well lead to 
undesirable mathematical behavior, due to special symmetries. 

E. Identifiability of numerical parameters: large trees 

We turn now to establishing Lemma 19, the key technical point needed in 
the proof of Theorem 6. While the method of proof of is similar to what appears 
in Appendix D, we generalize to models with an arbitrary number of states, and 
deal with larger trees in order to avoid obtaining a theorem that only holds for 
generic parameters. This complicates the presentation, but introduces few new 
ideas. 

We require some additional terminology. 

Definition. A binary tree is said to have an (m, n) split if deleting one edge 
partitions the taxa into sets of size m and n according to connected components 
of the resulting graph. A non-binary tree is said to have an (m, n) split if some 
binary resolution of it docs. 

Lemma 18. T has at least 7 taxa if, and only if, T has a (m, n) split with 
m> A and n > 3. 

Proof. We may assume T is binary. Suppose first T has at least 7 taxa. We 
consider three cases based on the number of cherries in T . 

If T has exactly two cherries, then T is a caterpillar tree and the forward 
implication is clear. 

If T has exactly three cherries, then T is obtained by grafting one or more 
additional edges to interior edges of the tree ((a, 6), (c, d), (e, /)) and the forward 
implication is again clear. 

If T has four or more cherries, then T is obtained by grafting rooted trees 
to the tree (((a, 6), (c, d)), ((e, /), {g, h)) and the forward implication is clear. 

The converse is clear. □ 

We use this to prove the lemma which is the key ingredient of Theorem 6. 

Lemma 19. Suppose T is an n-taxon tree with n > 7. Then the qi for i G I 
arising from some choice of GMkpars-inf parameters on T uniquely determine 
the qi for 1 ^I. 
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Proof. We may assume T is binary by passing to a binary resolution of it, noting 
that the probabiHty distributions arising from the model on the unresolved tree 
also arise from the model on the resolved tree by setting Markov matrices on 
new edges to the identity matrix. 

Let e denote some edge of T corresponding to an (m, n—m) split with m > 4, 
n — m > 3. 

Recall, the more general version of Theorem 10 for GMfc on n-taxon binary 
trees (Allman and Rhodes, 2008b): If P denotes the rt-dimensional kxkx - ■ -xk 
joint distribution tensor with entries pi, where i denotes a pattern, let F,, be the 
matrix obtained by flattening P along e. Then all {k + 1) x (A: + 1) minors of 
Fe are zero. 

Replacing each pi in F^ by qi to obtain a matrix F^ preserves the vanishing 
of these minors, due to the homogeneity of determinants. 

For each parsimony-noninformative pattern i ^ X, we will produce a {k + 
1) X (fc + 1) submatrix of Fg that involves qi but no other unknown gj. We will 
furthermore ensure that the fc x fc minor of this submatrix that uses rows and 
columns complementary to those of qi is non-zero. Then the vanishing of the 
(fc + 1) X (fc + 1) determinant leads to a formula for qi in terms of known gj, as 
in Section D. Thus we may recover all unknown values of g,; i ^ S*. 

To produce these (fc + 1) x (fc + 1) submatrices, we must fix additional 
notation. With e the fixed edge described above, we may assume our taxa 
are labeled so that the partition of taxa induced by removing e has sets Si = 
{ai, . . . , Om} and S2 = {cm+i, ■ • ■ , a„}, so F^ has rows indexed by [fc]"* and 
columns by [fc]"~™. We may further assume taxa Om-i and am form a cherry, 
as do a„_i and a„, and the other taxa in Si are numbered in a manner consistent 
with the diagram of the subtree of T shown in Figure 4, and similarly for those 
taxa in ^2 . Thus taxa are numbered in order of where the path from the deleted 
edge e to the taxa leaves the path from the deleted edge to a„i (respectively 
an)- 




Figure 4: Assumed ordering of taxa in the subtree of T to one side of e. 



For any pattern i e [fc]", let ii = proj5^(i) S [fc]™ and '12 = projs2(') ^ 

[fc]"-"\ 

The values of q; arc known if i has among its components at least 2 states 
that appear at least twice each. In cases 1-4 below, we will use these to first 
determine those gi for which i has exactly one component that appears at least 
twice, but i is not a constant pattern. Without loss of generality, we may assume 
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the component that appears at least twice m i is 1, yet \^ (1,1,...,1). 

Cast 1: No 1 appears in ii, so at least two Is appear in 12- All components of 
ii must be distinct, so let a be two of these. Consider the row indices 

11, and for each i G [k], = (a, a, . . . , a, 6, i), 
and the column indices 

12, and for each i G [A:], k; = (a, a, . . . , a, 6, i). 

Then the (fc + 1) x (fc + 1) submatrix of Fg formed by these rows and columns 
has all known entries except q\. 

Wc further claim the k x k submatrix L with entries Qq-^u^), hj & [k\ has 
non-zero determinant. To sec this, note that by viewing the tree T as rooted at 
the end of e closest to taxon ai, L has a matrix factorization 

L = Cf diag(7r)C2, (8) 

where the entries of Ci give probabilities of producing the patterns at the taxa 
in Si conditioned on the root state, and the entries of C2 similarly give condi- 
tional probabilities of producing the patterns at the taxa in 52 . Referring to 
Figure 4, we find 

Ci = DiM,,D2 . . . Ds^iMe^_,DsM,^, 

where each Di is a diagonal matrix whose entries give the probabilities of states 
at the ith node along the path from the root to m producing the particular 
pattern proj^. (a, . . . a, a, b) on the taxa in the set Bi labeling the leaves on the 
subtree branching off from that node. By our assumptions on parameters, all 
matrices in this product are non-singular, so Ci is as well. A similar prod- 
uct shows C2 is also non-singular, so by equation (8) the matrix has non-zero 
determinant as claimed. 

Case 2: Exactly one 1 appears in ii, so at least one 1 appears in 12- Again all 
components of ii must be distinct, so let a 7^ 1 be one of these. 
Then considering the row indices 

ii, and for each i G [fc], ji = (1,1,..., 1, a, a, i), 

and the column indices 

i2, and for each i G [fc], k^ = (1, 1, . . . , 1, a, i), 

we obtain the needed submatrix. 

Case 3: At least two Is appear in ii, and I2 has at least one component a 7^ 1. 
Let h ^ a denote any other component of 12 (so 5 = 1 is possible). Then 
considering the row indices 

ii, and for each z G [fc], ji = (&, 5, . . . , b, a, z), 
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and the column indices 

12, and for each i G [k], = (a, a, . . . , a, z), 
we obtain the needed submatrix. 

Case 4: At least two Is appear in ii, and 12 has all components 1. Since we 
are assuming i is not constant, ii must have some component a ^ I. Then 
considering the row indices 

ii, and for each i g [fc], = (1, 1, . . . , 1, a, a, i), 

and the column indices 

'12, and for each i £ [k], = (1,1,...,!, a, i), 

we obtain the needed submatrix. 

At this point all gi for all non-constant patterns i with at least one repeated 
component are known. We next use these to determine qi for a constant pattern 
i, which we may assume is all Is. 

Case 5: All components of ii and '12 are Is- Considering the row indices 

11, and for each i e [fc], = (1, 1, . . . , 1, 2, i), 
and the column indices 

12, and for each i G [fc], k^ = (1, 1, . . . , 1, 2, i), 

we obtain a submatrix all of whose entries except q; are already known. The 
non-singularity of the relevant k x k minor is again shown as in Case 1. 

A final case shows we can determine the remaining qi, which have no repeated 
components 

Case 6: No components of i are repeated. Considering the row indices 

11, and for each i e [fc], j.; = (1, 1, . . . , 1, i), 
and the column indices 

12, and for each i G [fc], k; = (1, 1, . . . , l,i), 

we obtain a submatrix all of whose entries except are already known, whose 
relevant k x k minor is similarly shown to be non-singular . □ 

References 

Allman, E., Rhodes, J., Jan 2008a. Identifying evolutionary trees and sub- 
stitution parameters for the general Markov model with invariable sites. 
Mathematical Biosciences 211 (1), 18-33. 

URL http : //linkinghub . elsevier . com/retrieve/pii/S0025556407001897 



28 



AUman, E. S., Holder, M. T., Rhodes, J. A., 2009. Supplementary material, 
Maple worksheet, http : //www . dms .uaf . edu/~ jrhodes/papers/AHRsup .mw. 

AUman. E. S., Rhodes, J. A., 2003. Phylogenctic invariants for the general 
Markov model of sequence mutation. Math. Biosci. 186, 113-144. 

AUman, E. S., Rhodes, J. A., 2007. Phylogenetic invariants. In: Gascuel, O., 
Steel, M. (Eds.), Reconstructing Evolution: New Mathematical and Compu- 
tational Advances. Oxford University Press, Oxford, pp. 108-147. 

AUman, E. S., Rhodes, J. A., 2008b. Phylogenetic ideals and varieties 
for the general Markov model. Adv. in Appl. Math. 40 (2), 127-148, 
arXiv : math . AG/0410604. 

Buneman, P., 1971. The recovery of trees from measures of dissimilarity. In: 
Mathematics in the Archeological and Historical Sciences. Edinburgh Univer- 
sity Press, Edinburgh, pp. 387-395. 

Cavender, J. A., 1978. Taxonomy with confidence. Mathematical Biosciences 
40, 271-280. 

Cavender, J. A., Felsenstein, J., 1987. Invariants of phylogenies in a simple case 
with discrete states. J. of Class. 4, 57-71. 

Chang, J. T., 1996. Full reconstruction of Markov models on evolutionary trees: 
identifiability and consistency. Math. Biosci. 137 (1), 51-73. 

Csiiros, M., Holy, J. A., Rogozin, I. B., 2007. In search of lost introns. Bioinfor- 
matics 23, 187-196. 

Diallo, A. B., Makarenkov, V., Blanchette, M., 2007. Exact and heuristic algo- 
rithms for the indel maximum likelihood problem. Journal of Computational 
Biology 14 (4), 446-461. 

Farris, J. S., 1973. A probability model for inferring evolutionary trees. System- 
atic Zoology 22, 250-256. 

Felsenstein, J., 1978. Cases in which parsimony or compatibility methods will 
be positively misleading. Systematic Zoology 27, 401-410. 

Felsenstein, J., Jan 1992. Phylogenies from restriction sites: a maximum- 
likelihood approach. Evolution 46, 159-173. 

URL http : //cat . inist . f r/?aModele=af f icheN&cpsidt=5208269 

Hennig, W., 1966. Phylogenetic Systematics. University of Illinois Press. 

Jukes, T. H., Cantor, C. R., 1969. Evolution of protein molecules. In: Munro, 
H. (Ed.), Mammalian protein metabolism. Academic Press, New York, pp. 
21-132. 



29 



Lewis, P. O., 2001. A likelihood approach to estimating phylogeny from discrete 
morphological character data. Systematic Biology 50 (6), 913-925. 

Massey, W. S., 1991. A basic course in algebraic topology. Vol. 127 of Graduate 
Texts in Mathematics. Springer- Verlag, New York. 

Mayr. E.. Linsley. E. G., Usinger. R. L., 1953. Methods and Principles of Sys- 
tematic Zoology. McGraw-Hill. 

Neyman, J., 1971. Molecular studies of evolution: A source of novel statistical 
problems. In: Gupta, S., Yackel, J. (Eds.), Statistical Decision Theory and 
Related Topics. Academic Press, New York, pp. 1-27. 

Nylander, J. A. A., Ronquist, F., Huelsenbeck, J. P., Nieves-Aldrey, J. L., 2004. 
Bayesian phylogenetic analysis of combined data. Systematic Biology 53 (1), 
47-67. 

Poe, S., Wiens, J. J., 2000. Character selection and the methodology of mor- 
phological phylogenetics. In: Wiens, J. J. (Ed.), Phylogenetic analysis of mor- 
phological data. Smithsonian Institution Press, Washington, D.C., pp. 20-36. 

Ramirez, M., 2007. Homology as a parsimony problem: a dynamic homology 
approach for morphological data. Cladistics 23, 588-612. 

URL http : //www . blackwell- synergy . com/doi/abs/10 . 1111/j . 1096-0031 . 2007 . 00162 . x 

Rannala, B., 2002. Identifiability of parameters in MCMC Bayesian inference of 
phylogeny. Syst. Biol. 51 (5), 754-760. 

Rieppel, O., Kearney, M., Jan 2002. Similarity. Biological Journal of the 
Linnean Society 75, 59-82. 

URL http : //www . blackwell- synergy . com/doi/abs/10 . 1046/ j . 1095-8312 . 2002 . 00006 . x 

Rieppel, O., Kearney, M., Mar 2007. The poverty of taxonomic characters. 
Biology & Philosophy 22 (1), 95-113. 

URL http : //www . springerlink . com/content/hl045281150h849p/ 

Ronquist, F. R., Huelsenbeck, J. P., 2003. MRBAYES 3: Bayesian phylogenetic 
inference under mixed models. Bioinformatics 19 (12), 1574-1575. 

Schulmeister, S., Aug 2004. Inconsistency of maximum parsimony revisited. 
Systematic Biology 53 (4), 512-528. 
URL http : //www . j stor . org/stable/4135421 

Semple, C., Steel, M., 2003. Phylogenetics. Vol. 24 of Oxford Lecture Series in 
Mathematics and its Applications. Oxford University Press, Oxford. 

Sereno, P., 2007. Logical basis for morphological characters in phylogenetics. 
Cladistics 23, 565-587. 

URL http : //www . blackwell-synergy . com/doi/abs/10 .1111/j . 1096-0031 . 2007 . 00161 . x 



30 



Steel, M., Jan 1994. Recovering a tree from the leaf colourations it generates 
under a Markov model. Appl. Math. Letters 7 (2), 19-23. 

URL http : //www. math . canterbury . ac.nz/~mathmas/research/markov3 .pdf 

Steel, M., Szckcly, L.. Hcndy. M.. 1994. Reconstructing trees from sequences 
whose sites evolve at variable rates. J. Comput. Biol. 1 (2), 153-163. 

Steel, M. A., Hendy, M. D.. Penny. D., 1993. Parsimony can be consistent! Sys. 
Biol. 42 (4), 581-587. 

Thorne, J. L., Kishino, H., Felsenstein, J., 1991. An evolutionary model for the 
maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114-124. 

Wald, A., 1949. Note on the consistency of the maximum likelihood estimate. 
Ann. Math. Statistics 20, 595-601. 

Wiley, E. O., 2008. Homology, identity and transformation. In: Arratia, G., 
Schultze, H.-P., Wilson, M. V. H. (Eds.), Mesozoic Fishes 4 - Homology and 
Phylogeny. Verlag Dr. Fricdrich Pfiel, Miinchcn, pp. 9-21. 



31 



